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Abstract 

This document details the construction of a model for tracking a position and 
velocity state from GPS observations, with the intention of efficient, parallel online- 
learning of state-dependent parameters. Specifically, the on/off-road dynamic lin- 
ear model of Ulmke and Koch (2006)| is adapted to the Particle Learning frame- 




work of |H. F. Lopes, et. al.(20lT)T ~ 
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1 Introduction 




The availability and ease of collecting GPS data have created a rising interest in "location 
aware" applications, many of which also define unobservable states such as transportation 
modes (e.g. walking, biking, driving, etc.), traveled roads, actual location and velocity. 
Real-time GPS data is also being used in trip planning and to ascertain traffic conditions. 
Although GPS accuracy may be sufficient for direct use in many simple applications, 
the unobserved states are often discrete and not amenable to a simple and consistent 
determination. 

For example, when dealing with road networks, there are common cases in which 
"snapping" (when a GPS point is orthogonally projected onto a line corresponding to a 
street) is not sufficient in itself. Often a smoothing of an estimated states sequence is 
desired, e.g. when estimating a sequence of roads traveled. 

In the field of transportation there is also an interest in studying the quality of 
roadmap data. It is possible to ascertain this value by how well the observed data 
obey the restrictions implicit in the street data. Using a model that includes on/off- 
road states, one can identify and inspect high off-road likelihood data, which can signify 
problem areas. 
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Other use cases involve tracking multiple vehicles with varying properties, such that 
it may not be truly effective to guess or even batch estimate the parameter values over 
historical data prior to starting a real-time filter. For a system that runs day-to-day and 
adds or removes sources of observations regularly (or, similarly, has unreliable sources), 
adjustable, sequential parameter learning may be necessary. As well, data may need to 
be removed, or learned values augmented real-time. 

This paper provides an extensible Bayesian model and an efficient particle filter for the 
estimation of its states and parameters through the Particle Learning (PL) framework. 
We establish a conditional dynamic linear model (DLM) for on and off-road states motion 
states, and detail the PL estimation procedure. A method for estimating the on-road/off- 
road probabilities sequentially is provided, and the model is conditionally amenable to 
conjugate methods of covariance estimation. 

Many of aforementioned concerns and requirements are well met by techniques within 
the conditional DLM setting of West and Harrison(1997)] , and PL framework provides 



sequential approximate distributions of any estimated parameters. Commonly, sequen- 
tial Bayesian applications in this domain, have their parameters estimated beforehand, 
resort to parameter point estimates (e.g. Expectation Maximization), or linearizing (e.g. 
Exte nded Kalman Filters). See [D.B. Work et. al. (2010)| for an example of the latter, 
and L. Liao, D. Fox, H. Kautz(2004)| for the former. 



In such cases the additional information provided by approximating the full parameter 
distributions isn't available, and/or a departure from a Bayesian setting is made. Also, 
many such models do not clearly build off of standard motion models, which in the case 
of conditional DLM's are easy to extend in a hierarchical fashion. For some examples of 
possible conditional DLM motion models see |X. Rong Li and Vesselin P. Jilkov] . 

2 Model Construction 

2.1 Paths and Edges 

An edge is defined as 

A = {(1-5) l a + Sl u \6e [0,1]} 
for start and end coordinates l a , l u G M 2 , and distance along the edge 6 G M. 1 . Our 



construction follows the definition of segments in Ulmke and Koch (2006)| . 

A path is defined as an ordered collection of connected edges V = {A^, . . . , A^} so 
that \V\ = N. It is possible parameterize locations on the path by the distance traveled 
along all of the ordered edges, i.e. 

l + h^ 0<d<d x 
+ d,<d<d 2 

In-i + (In-i — In) D~dN-i ^n-i < d < D 
where the distance dk is given by dk = 52j=i \h ~ h-Ai an d = 0. 



1(d) 
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2.1.1 Evaluated Paths 



In what follows, we construct our distributions conditional on a path, V. Naturally, the 
results of our model will depend very much on the V we consider and, in application, 
it is usually impractical to evaluate the entire support of V . Also, it's often clear that 
the majority of V will have near zero likelihood, so it's reasonable to employ a technique 
for finding useful subsets to evaluate, like an efficient sampling procedure or a search 
algorithm. 

The approach used here is to employ a modified A* search between some measure 
of the current state and an area around the observation. In the DLM-based motion 
model one can simply use the observation covariance to determine an area for finding A* 
destination nodes. As well, other motion model information can inform the search, such 
as the projected distance, which can be used as a heuristic. 

2.1.2 Edge Transitions 

The generalized prior distribution for edges is a transition matrix, where now edges, A^, 
are interpreted to be a unique integer index and A^ = indicates no edge, or off-road. 

' \ on j >k and A (i) > and A (fc) = 
TT ff j >k and \& = and A^ > 
7r ( r j ' k) j > k and A^'),AW > (2) 
7r ( J' k) j>k and A^ = A<*> = 
otherwise 

for connected edges A^'\A^ G V and constants 7r on , 7r //, iTr' k \ nl?' k \ Basically, we've 
defined a simple on-off transition matrix, for which it is straight-forward to assign a 
Dirichlet prior to its probabilities. 

Although this particular case is simply a Beta-Binomial combination, the transition 
matrix formulation makes clear the fact that we can extend this further to cover all road 
transitions. 



p(Atf)|A( fc >,P) oc < 



"9 J ^ « 

other 
' c ) G V and consta 



V 



2.2 Movement 



In what follows we specify the probability of assignments to an edge in a path, motion 
along an edge and free movement. 

Given an edge A > 0, we track the distance traveled and velocity along the edges in 
V . We define the road-movement state vector as 



To convert between this vector of road-movement measures and a state vector of 2D coor- 
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dinates and 2D velocity, called ground-movement coordinates, one can use the transform 



Pi 



la(X) — L(A) 



Si = l a (\) - A A ^(A) 



U 



P x 
P* 
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v{ o 
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where a(X),uj(\) are indicate the start and end distances and positions on A, and = 
[0, 0] T , so that 

Vl,i 

V v ^ J 



An inverse mapping is d — P 



\,T , 



X 



s ) for x E 




Since the projection does not preserve the magnitude of velocity when going on-road, it is 
likely that using this transform will have undesired effects in practice, i.e. projecting onto 
an edge corresponding to a sharp turn can remove a large amount of velocity. Naturally, 



3n projec 

The transition equations and distribution 



one can preserve the magnitude of velocity when projecting on-road, if desired. 

r , i+1 ~N(0,a 2 r ) (3) 
r — 




1 
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2 
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For movement not on a path (i.e. A = 0), or when referring to ground-movement coordi- 
nates, we switch to a standard planar position-velocity model with 



x 

( Ki \ 

Vl,i 

All together, we have 



ndard pi 

f +1 = G g Xi + r 2 e g>i+1 , e 9;i+1 ~ N(0, a 2 I 2 ) 



(4) 



X; 



( 1 At, 

1 



\0 



\ 



1 At, 
1 / 



/ 


AU 2 
2 


o \ 




AU 










AU 2 




2 


V 








Xi 



P x xl + s x A > 
A = 



X 9 : 



To summarize, we have two states, x^,x?, straight-line kinematics over a path and free- 
movement, respectively, then we constructed a general motion state x iy which projects 
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the path kinematics to ground-coordinates when appropriate. 
Finally, with our generalized motion state, the observation equation is 

/ l o \ 




1 
\0 0/ 



,e tfji ~JV(0,^/ 2 ) 



2.2.1 Path Predictions and Posteriors 

We proceed to define the predictive distribution for a path V, and then the filter updates. 
In what follows, path V always starts with the previous edge \. Also, we define Zi to 
be the collection of observations and distribution parameters up to observation i, i.e. 
Zi = {i/i, JCi, Xi} where /Q are the Kalman filter parameters in ground coordinates, e.g. 
Ki = {m h Q} . 

First, let the road-movement prior predictive distribution be 

p(x r i+1 \Zi) ~ N(a i+1 ,R i+1 ) 

which we obtain from ([3]). The use of tildes on the mean and variance parameters signify 
road-coordinates. 

A straight-forward construction of the predictive distribution for a given path would 
involve strictly assigning the edge that corresponds to the predicted length, as in 

P(x r i+i, Xi+i\V,Zi) = p (x r i+1 \Zi) p (X i+1 \x r i+1 ,V) p(X i+1 \Xi,P) (5) 

otherwise 

where O r = (1,0) 

Following |Ulmke and Koch (2006)| , we replace the step function in ^ by a normal 
distribution centered on the edge and with a variance proportional to its length, 

P (A m \x r i+1 ,V) oc f N (d Xi+1 ;O r x r t+1 ,Adl. + ^j (7) 

where 

= (d w (X) + d a (x))/2, Ad x = (d u (A) - d«(A))/\/l2 

It is important to recall that the distance terms d\, Adx,d^\),d a ^x) are path dependent, 
such that d a n) is the distance up to the start of A on V . 

Moving on, we can derive a categorical prior predictive distribution over Aj + i G V 
using 

p(X i+1 \V,Zi) oc f N (d Xl+1 ;O r d i+1 ,O r R i+1 Oj + Ad 2 Xi+i ) 
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From here we can solve the prior prediction equations, in road-movement coordinates, 
for one edge with 



P (xl +1 \X i+1 ,V, Zi) oc p (\ i+1 \x r i+l ,P) p (x r i+1 \Z t ) 



so that 



p{xl +1 \\ i+ll V,Z l ) ~iv(a* +1 ,i^ 

and, as in |Ulmke and Koch (2006)] , we use the product formula for normals 

f N (x; Xy, Y)f N (y; z, Z) = f N (x; a, A)f N (y; b, B) 
b = z + W(x - Xz), a = Xz 
B = Z - WAW T 
W = ZX T A~ l 
A = XZX T + Y 



and obtain 



a$ +1 = a t + W x +l (d x - O 



W t x +1 = R i+1 OjS7^ 




Sf+i = O r R i+ iOj + Ad\ z 
We can now obtain the prior predictive distribution in ground-coordinates 

p{x i+1 \V,Zi) OC ^ P( x *+l\^i+l>'P> Z i)p(^i+l\'P> Z i)p(*i+l\K'P) 



p(x i+1 \X i+1 ,V, Zi) ~ N 



P A at_ 1 + 8 \I$, 1 = P A RL 1 P A 



Filtering is accomplished by conversion to ground coordinates and updating the state 
sufficient statistics, through normal Kalman filtering 

p(x i+ i\V,Z i+1 ) oc ^ p{K+iW,V,Zi) / p(y i+1 \x i+ i)p(x i+ i\X i+1 ,V,Z i ) 
x i+1 ev •^ Xi + 1 

oc p(\i + x\\i,V,Zi)p(xi +1 \\i +1 ,V,Z i+ i) 

where 

p (Aj + i|Ai, V, Z^ = p {\ i+1 \V, Zi) p(\ i+1 \Xi, V) 
p (x i+ i|A, Z i+1 ) ~ N (m x +1 , C x +l ) 
with posterior mean and covariance 



m 



,A _„A 



1 =at +1 + Af +1 (y i+1 -4 +1 ) 



CT + \' A + T g O g /al 

a A _pA /n T n _1 ' A 
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and predictive mean and covariance 

e i+i = Og a i+i 
Qi+l =Og R i+i°J + a p2 
and then conversion back to edge coordinates 

p(x r i+1 \P,Z i+1 ) oc Yl P^i+i\^V,Z i+1 )N(ml 1 = P^ T (ml 1 -s x ), C X +1 =P X ' T C X 

Finally, for off- road movement, filtering is performed in ground coordinates, using the 
standard Kalman filter updates. 



3 Estimation 



Since our state, Xj, is a DLM conditional on a path and edge, we are able to leverage Rao- 
Blackwellization and provide efficient updates to hyper-parameters. The likelihoods and 
posteriors are conditionally known, and we are able to sample directly from them, so the 
results are perfectly adapted ( |Pitt and Shephard (1999)] , |H. F. Lopes, et. al.(201lj] ). 
This provides a setting that is well suited for estimating hyper-parameters online, even 
when their estimation entails sampling and not simply conjugate updates. 

Considering the defined distributions, we have a situation very reminiscent of the 
tracking example in Jun S. Liu et. al.(2001)|, and the dynamic factor model with 



time- varying loadings in H. F. Lopes, et. al.(2011)| , which includes estimation for 



observation and state covariances. Through a similar design we attempt to relate 
their studied efficiencies and improvements to components of our model. Specifically, 
Jun S. Liu et. al.(2001)| demonstrates improvements of mixture Kalman filter (MKF) 



state estimation over plain SIS or particle filters, and H. F. Lopes, et. al.(2011)] shows 



the efficiency of state-transition and variance estimation in the PL setting for a MKF. 
These results have motivated our use of MKF/PL in what follows. 

We follow the CDLM section and switching model example in 
H. F. Lopes, et. al. (2011)] . Starting with N particles of Zi = {x^A^/Q} approx- 



imating p (xi, Xi, )Ci\yi) the particle filter steps are 

1. Re-sampling: Draw an index k? ~ Multi(wf\ . . . ,w[ ) with weights 

oc piy^K^X,)^) 

with 

p(Vi+i\fci, K) = Y1 Yl Ml/i+i; ejft 1 , Q^ji 1 ) p(X i+1 \Xi, P i+1 , Zi) 

2. Propagating state X: Draw AH_\ from 

p(A i+1 |(/Q, Xi,P l+1 ) {k3 \y l+1 ) oc fN(yi+i,4+i\Qi+i) P(\+i\k,Pi+i,Zi) 
Similarly, if there are any conditional dependencies on sample here. 
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3. Propagating state sufficient statistics, /Q+i.' Perform the Kalman filter recursions 
described in ([8]), or for = 0, the standard Kalman recursions. 

4. Propagating parameter sufficient statistics: Perform off-line update recursions for 
any estimated parameters, e.g. prior hyper-parameters for 7i on , iioff, 7T r , n g , of, o 2 g . 



4 Implementation 

In what follows we observe the performance of the Particle Learning filter described 
above against a simple Bootstrap filter. We are primarily concerned with basic accuracy, 
stability and the computational requirements for runs of decent length over a non-trivial 
graph. Secondly, through simulation we are able to directly measure the accuracy of 
parameter estimation. For our implementation we used a graph of Washington DC, and 
generated roughly 1000 of observations for each run. 

The Bootstrap filter estimates only the states, as described below, and the PL filter 
estimates the states and edge transitions. To do this, the distribution in (|2]) is par- 
titioned into two Bernoulli distributions with Beta distributed transition probabilities, 
under the restriction that iTr'^ = ir r , = 7r g (the same restriction is used in the 

simulations and bootstrap filter). Due to conjugacy, updates are straight-forward and 
performed in step [4] above. This construction follows the transition matrix in Example 3 
of [H. F. Lopes, et. al. (2011)1 . ^^^^ JT 



4.1 Bootstrap Filter and Simulation Procedure 

The bootstrap filter (BS) consists of sampling p(xi+i\xi,V) by first updating per ^ or 
Q, and including the error due to the state transition covariance. Then, for the case 
of A we move in the direction and length of the sampled movement, sampling edge 
transitions according to (J2| and (|6]). Simulations are produced in exactly the same way. 

Furthermore, when filtering, the effective sample size (ESS) is calculated for the 
propagated sample set and a condition is checked to determine if resampling is required. 
The condition was set to ESS < 0.9N, where N is the initial sample size. 

4.2 Setup 

Both models are run on the same simulated data set with varying particle sizes and run 
lengths. The true parameters were 

At = 30 
<J 2 y = 100 h 
a 2 r = 6.25 x 10~ 4 
a 2 g = 6.25 x 10~ 4 J 2 

7T g = 0.05, 7T on = 0.95 
1T ff = 0.05, 7T r = 0.95 



The BS filter was run using the true parameters as well as the PL filter with the exception 
of the 7r parameters, which were replaced by Beta parameters equaling 

( a off,off = 15, OL ff,on = 20) 
{aon,on = 70, a n,o// = 100) 

for the "off" and "on" distributions, respectively. 

Simulations were performed on an Intel Core i5 with 16 gigs of RAM on Ubuntu 12.01 
and using Java 6 OpenJDK amd64, and for each observation the root mean squared error 
(RMSE) of the position-velocity state vector, in ground-coordinates, was computed over 
the set of posterior particles. 

4.3 Results 

Figures [TJ [2] compare the log RMSE values for the two filters. Figure [I] shows that PL 
RMSE is lower, on average, than BS, across all particle sizes, but the most noticeable 
difference is how many fewer particles PL takes for its accuracy. Figures |4j [3] show 95% 
credibility intervals for the estimated ^off,off, ^on,on 

with a 2 g = 6.25 x 10~ 3 / 4 in both the 

simulations and PL filters. 

The online estimation of ^off^ff^on^n demonstrates some of the inherent complex- 
ities with having on and off-road states. For example, in Figure [3] we can see that 
estimation is biased upward in both terms. Observing the data directly, we often find 
that off-road states are not propagated due to the location in which the simulation goes 
off-road. Where there are many streets close together, an off-road state that lasts for one 
observation is easy to interpret as an on-road state, since there will likely exist connect- 
ing roads that fit with the true velocity and location. The same goes for cases in which 
an off-road state is traveling parallel to or, for all practical purposes, on a road. We 
would expect that longer off-road sojourn times would help differentiate, e.g. Figure [3j 
or noticeably different dynamics for off-road states, and they do. The off-road to off-road 
probability in Figure [3] is very small, so we can expect that staying off- road is very brief, 
and, given the large size of 7i on , the simulation doesn't go off-road very often. Figure [4] 
shows that increasing ir ff can help identify the signal. 

The results were computed in Java, synchronously, and, are naturally implementation 
dependent. Most of the time spent in the PL filter revolves around an A* computation 
of shortest paths. In the implementation used here, some caching was utilized to avoid 
repeated computations, and, of course, the BS filter does not require such computations. 
The BS filter is clearly the faster choice for low particle size, but referencing Figures [TJ 
[2] we can see that to achieve an RMSE comparable to the PL filter one needs to increase 
the particle size, which pulls the BS filter's throughput results to those of the PL filter 
at 25 particles. 

Regarding the filter comparison in generality, we're mostly seeing the documented 
benefit of using fully- adapted, Rao-Blackwellized filters, since the advantage of path- 
searching over path-sampling has been mostly diminished. To address this difference, 
the above tests were performed with parameters that restricted the distances moved 
between observations to values that do not push the limits of path-searching against 
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sampling. The simulations spend most of their time transitioning between 0-3 edges, so 
that simple edge sampling is unlikely to fail at finding the true paths. 

For models or data with larger state covariances, and thus more error due to accel- 
eration, the comparison between path generating methods may not be useful, as path- 
sampling models like the BS filter will quickly face issues. The same goes for settings in 
which the graph is more dense and/or paths are more restricted. In tests, when dead- 
end's are reached, or highly divergent paths are taken, we've observed that both models 
will eventually go off-road and find their way back to the correct path (this also accounts 
for some of the bias in 7r on , 7r ff estimation). This can be seen in the RMSE results, as in 
[2} where large values appear in series. Even though the simulation parameters are less 
likely to cause these events, the city graphs are non-trivial, so the behaviour is present, 
and, for the times in which it does occur, we can see that with increasing particle size 
the BS filter is less affected, as expected. 



5 Conclusion and Future Work 



The results demonstrate that the PL model is effective in this setting, and that it can 
perform well relative to a standard filter based on the exact data generating process for 
an actual city graph. The simulations were designed to approximate the behaviour of 
real data that were collected in Cebu and DC. Also, in real use cases, the PL filter's MAP 
results were overall accurate at inferring traveled segments and on/off states containing 
a mix of walking and biking in the DC area (not shown). 

Immediate future work will demonstrate the learning aspects, through covariance 
estimation and removing the restrictions on 7Tr' k \ 7r^' k \ as well as test the limits of accu- 
racy in varied acceleration settings. Comparisons with other sophisticated or specialized 
models may be of greater use in determining a comparative advantage to this design, so 
this line of research will be pursued. 

The model is being put to use in our work at OpenPlans.org, and is available as 



an open-soure project at https : //github . com/openplans/ open-tracking-tools The 



results in this paper were all produced with the aforementioned code, which is setup 
to filter custom csv data and produce simulations for any city using OpenTripPlanner 
graphs (opentripplan ner . org[ ) built with OpenStreetMap (www.openstreetmap.org) 
data. There is also a web-based GUI built into the library that allows for inspection of 
the filter and simulation results. 
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Figure 1: RMSE Comparisons 



12 




Figure 2: RMSE time series. 
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